Implement `lance.write_table` API and test the simplest round trips by eddyxu · Pull Request #23 · lance-format/lance

eddyxu · 2022-07-11T23:39:45Z

Added lance.write_table() API
Test a round trip of writing data into lance data and read it back

Closes #3

add parameters

changhiskhan · 2022-07-11T23:48:52Z

python/lance/__init__.py

+    ----------
+    table : pa.Table
+        Apache Arrow Table
+    sink : str or `Path`


Match signature?

changhiskhan · 2022-07-11T23:49:55Z

python/lance/__init__.py

    return ds.dataset(uri, format=fmt)
+
+
+def write_table(table: pa.Table, destination: Union[str, Path], primary_key: str):


maybe add a convenience to auto generate a pk column?

should we push that to the application / db level?

If we want people to use it as a python library then it's probably a good idea to have it. Could be in a wrapper function or something? Should also check for uniqueness there as well.

changhiskhan · 2022-07-11T23:51:25Z

python/lance/__init__.py

    return ds.dataset(uri, format=fmt)
+
+
+def write_table(table: pa.Table, destination: Union[str, Path], primary_key: str):


So this requires holding everything in memory first right? If we have a bunch of images on S3, does this mean we need to hold them all in Arrow memory to convert to lance format?

Right, so there will be a StreamWriter which basically opens a DatasetWriter and write batch records one by one.

It is another set of interfaces tho.

Similar to parquet https://arrow.apache.org/docs/cpp/parquet.html#writing-parquet-files

eddyxu added 4 commits July 11, 2022 14:49

fix clang compile

1bbb741

add parameters

impl options

aefe4f6

write table

f2c1ea0

fix format

af5dc42

eddyxu requested a review from changhiskhan July 11, 2022 23:40

eddyxu self-assigned this Jul 11, 2022

eddyxu added the python label Jul 11, 2022

fix formating

db749fa

changhiskhan reviewed Jul 11, 2022

View reviewed changes

fix signatures

e680312

changhiskhan approved these changes Jul 12, 2022

View reviewed changes

eddyxu merged commit d905bb3 into main Jul 12, 2022

eddyxu deleted the lei/py_write branch July 12, 2022 00:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `lance.write_table` API and test the simplest round trips#23

Implement `lance.write_table` API and test the simplest round trips#23
eddyxu merged 6 commits intomainfrom
lei/py_write

eddyxu commented Jul 11, 2022 •

edited

Loading

Uh oh!

changhiskhan Jul 11, 2022

Uh oh!

eddyxu Jul 11, 2022

Uh oh!

changhiskhan Jul 11, 2022

Uh oh!

eddyxu Jul 11, 2022

Uh oh!

changhiskhan Jul 12, 2022

Uh oh!

changhiskhan Jul 11, 2022

Uh oh!

eddyxu Jul 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return ds.dataset(uri, format=fmt)


		def write_table(table: pa.Table, destination: Union[str, Path], primary_key: str):

Conversation

eddyxu commented Jul 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changhiskhan Jul 11, 2022

Choose a reason for hiding this comment

Uh oh!

eddyxu Jul 11, 2022

Choose a reason for hiding this comment

Uh oh!

changhiskhan Jul 11, 2022

Choose a reason for hiding this comment

Uh oh!

eddyxu Jul 11, 2022

Choose a reason for hiding this comment

Uh oh!

changhiskhan Jul 12, 2022

Choose a reason for hiding this comment

Uh oh!

changhiskhan Jul 11, 2022

Choose a reason for hiding this comment

Uh oh!

eddyxu Jul 11, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eddyxu commented Jul 11, 2022 •

edited

Loading